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In this paper we present the extension of an existing method for abstract graph-based state space 
exploration, called neighbourhood abstraction, with a reduction technique based on subsumption. 
Basically, one abstract state subsumes another when it covers more concrete states; in such a case, 
the subsumed state need not be included in the state space, thus giving a reduction. We explain the 
theory and especially also report on a number of experiments, which show that subsumption indeed 
drastically reduces both the state space and the resources (time and memory) needed to compute it. 

1 Introduction 

Traversal of the state space of systems is the cornerstone of many verification/analysis methods, e.g., 
model checking 1 1 ], and is therefore a subject under intense investigation. In particular, techniques for 
pruning the search space (e.g. , partial-order reduction [10]) and duplicate state detection (e.g. , collapsing 
under isomorphism lfl6l ) were shown to be essential ingredients in the fight to tame the ever looming 
problem of state space explosion. However, important classes of systems have infinite state spaces and 
therefore cannot be (fully) explored using traditional traversal techniques. 

One way to address this problem is to perform state abstraction, where "similar" concrete states 
are collapsed under an abstract representative, with the behaviour of the abstract state encompassing all 
possible behaviour of the collapsed concrete states. This notion of abstraction is the basis of well-known 
techniques such as abstract interpretation [5] and shape analysis GUI . 

State "similarity" is the point where many abstractions differ; in order to define what "similar" means 
one has to look at the underlying framework one uses to represent systems. In our case, we use graph 
transformation as the framework for modelling system behaviour and therefore our abstraction works 
over graphs. In this context of graph transformation, many theoretical studies on suitable abstractions 
have been proposed jl5[ [T7l |3] [I9j 111 111 ED- However, only the last three of these were backed-up by 
tool support. 

In previous work lfl"8l . we presented a prototype implementation of the neighbourhood abstraction 
theory developed in Q, as an extension of the GROOVE tool set |[T4l 191. The main functionality of 
GROOVE is the ability to explore the state space of graph transformation systems (more details in Sec- 
tion^, but a concrete exploration can only traverse part of the state space of an infinite state system. The 
abstraction extension allows GROOVE to generate a. finite abstract state space that over-approximates the 
original concrete one. Over-approximation guarantees soundness of the verification, i.e., properties that 
hold in the abstract domain also hold in the concrete counterpart. 

The main goal of the prototype implementation was to serve as a practical proof-of-concept of the 
theoretical ideas. Unsurprisingly, performance was not optimal and only a few small systems could be 
properly analysed. Since then, we completely re-implemented the abstraction code and incorporated 
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many performance improvements. This paper describes one of such improvements, based on the concept 
of state subsumption. We illustrate the performance gain provided by the subsumption with experiments 
using test cases from different areas of computer science. 

The rest of this paper is organised as follows. First, we present the key concepts of graph transforma- 
tion and abstraction in Section [2] In Section [3] we introduce the subsumption relation for abstract states 
and we show how it can be used during exploration. In Section |4j we present the experiments performed 
and analyse the results. Related work is discussed in Section[5] Finally, Section [6] concludes the paper. 



2 Graph Production Systems and Abstraction 

Graph transformation Q is a rewriting technique that operates over graphs. In its simplest form, a 
transformation rule consists of a left-hand side (LHS) and a right-hand side (RHS) graph and specifies 
the changes that should be performed to a host graph. Applying a rule r to a host graph G basically 
amounts to finding a match m of the LHS of r in G and replacing this matched part of G by the RHS of 

f ffi y 

r, thus producing a new graph H. We write G — > H to denote a rule application and we write G — > H 
if there exists a match m such that G — > H. 

A graph production system (or graph grammar) is formed by a set of graph transformation rules R and 
a start host graph. State space exploration of the grammar consists of performing all possible applications 
of the rules from R into the start graph, and repeating this process to all newly generated graphs. The state 
space obtained in this way can be represented by a Graph Transition System (GTS), which is a labelled 
transition system where states are host graphs and transitions are rule applications, i.e. , a pair of rule r and 
associated match m. Once generated, a GTS can be analysed as usual, for example by model checking 
properties written as temporal logic formulae (e.g., using Computation Tree Logic - CTL). Clearly, if 
the rewrite system modelled by a graph grammar is non-terminating, the state space is infinite and thus a 
GTS cannot be fully constructed by a normal exploration method. To handle infinite state systems, some 
form of abstraction is required. One of such abstractions, called neighbourhood abstraction, is presented 



in Section 2.2 First, we formalise the graph representation that we use. 

We assume the existence of a finite set of labels Lab, partitioned into unary and binary label sets, 
denoted Lab u and Lab B , respectively. We work with simple directed graphs, with labels taken from Lab. 

Definition 1 (Graph) A graph is a tuple G = (N,E), where N is a finite set of nodes and E C N x Lab x 

N is a finite set of directed labelled edges, such that (v, l,w) G E with I € Lab u implies v = w. < 

Given (v,l,w) G E, v and w are called source and target nodes, respectively; and / is the edge label. 
We simulate node labels with self-edges labelled with unary labels. Given v G N, the set of labels of 
node v, denoted lab(v), is defined as lab(v) = {/ G Lab u | (v,l,v) G E}. For convenience, we write E B to 
denote the set of edges with binary labels, i.e., E B = {(v,l,w) G E \ I G Lab B }. 



2.1 Example of a Graph Grammar 

As an example we use a graph grammar modelling the behaviour of a firewall in a network, taken 
from [12l. A firewall has inner and outer interfaces, to which locations can be connected. Locations are 
marked with the kind of interface they are connected to. Data are represented as packets, which can be 
transferred between locations or through the firewall. Packets can either be safe or unsafe. Safe packets 
can be at any location but unsafe packets cannot exist at inner locations. Figure [T] shows an example 
configuration of a network with simple abbreviations used for conciseness. The network has one outer 
location and two inner ones, and there are five packets being transmitted. 
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FW - firewall 


o — outer 


IF - interface 


/ - inner 


L - location 


s - safe 


P - packet 


u - unsafe 



Figure 1 : Concrete graph representing a possible state of a network with a firewall 
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Figure 2: Examples of transformation rules for the firewall grammar. 



Figure [2] shows four transformation rules of the grammar. The rules are in GROOVE single-graph 
notation, which uses colours and line formats to distinguish rule elements. Black (continuous thin) 
elements are matched and kept by rule application, blue elements (dashed thin) are matched and deleted, 
and green (continuous bold) elements are created^ Figures 2(a) and 2(b) show the rules for packet 
creation. A safe packet can be created at any location, whereas an unsafe packet can only be created at 
outer locations. Infinite behaviour stems from these two rules; since they are always enabled, an infinite 



number of packets can be created. Figure 2(c) depicts a rule for packet transfer between locations. Since 
all locations on each side of the firewall are of the same type, there is no need to distinguish between 
safe and unsafe packets. A dual rule (not shown here) transfers packets on the reverse direction of the 



connection edge, thus making the connection bi-directional. The rule in Figure 2(d) shows the firewall 
filter, that only allows safe packets to reach inner locations. 

2.2 Neighbourhood Abstraction 

Our notion of abstraction is based on neighbourhood similarity: nodes are considered equivalent if they 
have the same labels and the same number of incoming and outgoing edges. Graphs are abstracted by 
folding all equivalent nodes into one, while keeping count of their original number up to some bound of 
precision. Counting up to some bound is done using multiplicities. 

2.2.1 Multiplicities 

We use co to denote an upper bound on the set of natural numbers, i.e., co and for all k G N,k < 00. 
We write N ro as a short-hand notation for NU {oo}. We can then define simple arithmetic operations over 
N ffl , such as addition and subtraction. For example, given i,j £ N°\ 



i + j 



i + j ifi,y€N, 
CO otherwise. 



In the standard two-graph notation, the LHS is formed by black and blue elements and RHS by black and green elements. 



38 



Graph Subsumption in Abstract State Space Exploration 



The symbol + is overloaded: the one on the left represents addition over N ffl while the one on the right 
is the usual addition over N. Note that the first condition of the definition implies that i + j< CO. 

Definition 2 (Multiplicity) A multiplicity is an element of the set M = G (N x N w ) \ i < j}. < 

Multiplicities are used to represent an interval of consecutive values taken from N m , i.e., we write 
(i,j) as a compact representation for the set {k G N ro | i < k < j}. Given a multiplicity (i,j) G M: 

• if i = j, we call the multiplicity singleton and we use a short-hand notation by writing only the 
lower-bound i in bold, i.e., i. The singleton multiplicity 1 is called concrete; and 

• if j = CO, we use a short-hand notation by writing the lower-bound i in bold, super-scripted with 
+, i.e., i+. 

Set M is infinite, since i and j are taken from infinite sets. To ensure finiteness, we need to define a 
bound of precision, which limits the possible values of i and j. 

Definition 3 (Bounded multiplicity) A bounded multiplicity is an element of set M b C M, defined, for 
a given bound be N, asM h = {(i, j) G M | i < b + 1, j G {0, . . . ,b, co}}. < 

The theory of neighbourhood abstraction presented in is parameterised with two multiplicity 
bounds, for node and edge counting. In practice, these bounds are usually set to a low value, such as 1 
or 2, since they can greatly affect the size of the abstract state space. For the remainder of this paper we 
consider both bounds to be 1 and we only work with bounded multiplicities. This effectively limits the 
possible multiplicity values to six: 0, (0, 1), + , 1, 1 + , and 2 + . Any natural number can be projected to 
a bounded multiplicity by means of a simple approximation function. For a given set A, we write |A| to 
denote the bounded multiplicity approximated from the cardinality of set A. 

It is simple to define arithmetic operations over multiplicities based on the operations over N ffl . In 
order to later define the state subsumption relation (Section [3) we need the concept of multiplicity sub- 
sumption, which amounts to interval inclusion. Given two bounded multiplicities p., v G M b , let ii = 
and v = (i',f). We say that jj, is subsumed by v or that v subsumes jl, denoted fl C V, if i > i' and j < f. 

2.2.2 Neighbourhood Equivalence 

We begin this section introducing some additional notation. Given a graph G = (N,E), a node v G ./V, a 
binary label / G Lab B , and a set of nodes CCiV.we consider the following sets of edges: 

• out(v,/,C) = {(v,l,w) G E B | w G C}, i.e., the set of outgoing /-edges from v into nodes of C; and 

• in(v,/,C) = {(w,l,v) G E B \ w G C}, i.e., the set of incoming /-edges into v from nodes of C. 
Formally, to define neighbourhood similarity we create a neighbourhood equivalence relation = over 

graph nodes. 

Definition 4 (Neighbourhood equivalence relation) Given a graph G = (N,E), the neighbourhood 
equivalence relation = over nodes ofG is defined for two radii (with v, w G N): 

• v =o w /flab(v) = lab(w); and 

• v=i w ifv =o w, |out(v,/,C)| = |out(w,/,C)|, and |in(v,/,C)| = \\n(w,l,C)\, for all binary labels 
/ G Lab B and all sets of nodes C G N /=q. < 

From the definition, we see that two nodes are equivalent at radius if they have the same labels. 
Equivalence classes are then refined at radius 1 , where we look for the number of edges incoming from 
and outgoing to nodes of other equivalence classes. 

As with multiplicity bounds, the theory is also parameterised with an abstraction radius. However, 
experiments with the prototype implementation showed that increasing the radius above one is not feasi- 
ble in practice. The current abstraction implementation fixes the maximum radius to one. 
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Figure 3: Shape obtained when abstracting the graph of Figure [T] Node multiplicities are indicated by 
thickness: bold nodes have multiplicity 2 + ; thin nodes have multiplicity 1. Edge groups with multiplicity 
2 + are explicitly shown; groups without values in the figure have multiplicity 1. Edge multiplicities close 
to the edge arrow indicate incoming multiplicities from the opposite equivalence class. 



2.2.3 Shapes 

Our graph abstractions are called shapes, following the term defined in shape analysis ll20ll . A shape is a 
graph with some additional structure. 

Definition 5 (Shape) A shape is a tuple S = (G$, — s, mult£, mult^, mult^), where: 

• Gs = (Ns,Es) is the underlying graph structure of the shape; 

• —sQNsxNs is a similarity relation over nodes ofS; 

• mult£ : Ns — > M is a node multiplicity function, which records how many concrete nodes were 
folded into an abstract node; and 

• mult.5, mult^ : (Afc x Lab B x Ns/— S ) — are outgoing and incoming edge multiplicity functions, 
which record how many concrete edges with a certain label were folded into an abstract edge. < 

If a shape node v has an associated concrete multiplicity, i.e., mult£(v) = 1, then v is called concrete. 
Nodes that are not concrete are called collectors. 

A shape is constructed by abstracting a concrete host graph. Given a graph G, we first compute the 
neighbourhood equivalence relation = over G. The graph component Gs of the shape is constructed by 
folding the nodes of each equivalence class of =1, while recording the multiplicities of these classes in 
the multiplicity maps of S. The similarity relation ~ s is taken as =o^\ A detailed explanation on shape 
construction is given in Q. 

Figure [3] shows an example of a shape. The graph structure is drawn as usual. The similarity relation 
~5 is indicated with dashed boxes. Node multiplicities are represented by line thickness: bold nodes 
have multiplicity 2 + and thin nodes have multiplicity 1. Groups of edges with incoming multiplicity 2 + 
are explicitly identified; the remainder edge multiplicities are all equal to 1 and are not shown. The shape 
in Figure [3] is an abstraction of the graph from Figure [TJ where the safe packets at the top inner location 
and the unsafe packets at the outer location of the graph were collapsed into collectors nodes of the 
shape (since the multiplicity bounds are one, any natural greater than one is mapped to 2 + in the abstract 

2 The definition of a shape is "generic" in the sense that any binary relation on nodes can be used as the similarity relation ~. 
In this paper, however, we consider only shapes where the relation ~ is taken as the neighbourhood equivalence =q. 
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domairjj). An important point to note is that the shape in Figure |3| serves as an abstract representative 
not only for the graph in Figure [T]but also for any graph with two or more packets of the correct kind at 
the corresponding locations of the shape. Given a shape S, we write concr(S') to indicate the (possibly 
infinite) set of concretisations of S, i.e., the set of graphs that can be abstracted to S. 

3 Subsumption for State Space Reduction 

In this section we first present the general algorithm for abstract state space traversal and we show the 
current duplicate detection mechanism used in GROOVE, based on graph certificates and graph isomor- 
phism checks. We then proceed to explain the new method of subsumption collapsing for abstract states 
and we give a modified version for the traversal algorithm. This subsumption relation, along with the 
experimental results given in Section |4| constitute the new contributions of this paper. 

3.1 Abstract State Space Traversal 

Listing [T] gives the pseudo-code for exploring the abstract state space. Q is the set of all shapes and F 
the set of fresh, yet to be explored shapes; R is the set of rules, G the start graph, and P is the set of rule 
applications that were computed during exploration. 

Listing 1 : Algorithm for abstract state space exploration. 

let S := abstract(G), Q := 0, P := 0, F := {S} 
while F ^ 

do choose S G F // which S is selected depends on the exploration strategy 
let F := F\{S} 

for r G R, m G prematch(r,5'), S' E materialise^, S) 
do let T := normalise(apply(r,ra,S')) 

if isFresh(r,g,F) Z/ifTgQ 

then let Q : = gU{r}, F := FU{T} 

fi 

\etP := PU{S^T} 

od 

od 

The main phases in this algorithm are: 

• abstract computes the shape of a graph, as explained in the previous section. 

• prematch computes non-injective morphisms of a rule r into a shape 5. Such a morphism is not 
yet a match, because the images of r's LHS may be collector elements; in this case they have to be 
materialised. 

• materialise creates concrete nodes and edges for the image of r in S. This is a non-deterministic 
step, as there may be options for choosing multiplicities for the materialised elements. 

• apply is rule application, which can be carried out as usual because the rule now acts upon a 
concrete subgraph of S'. At this step, the match of the rule is injective. 

• normalise merges the transformed graph back into the rest of the shape; it is thus similar to abstract 
except that it acts upon a (partially materialised) shape rather than a graph. 

3 This loss of precision is intended, otherwise the abstraction would not be finite. 
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Each of these phases are explained in detail in |[T8l . For the purposes of this paper it suffices to focus on 
the operations at lines 3 and 7 of the algorithm given in Listing [T] 

Line 3 deals with the policy for selecting a shape S from the set F of shapes to be explored. We 
consider two policies, namely Breadth-First Search (BFS) and Depth-First Search (DFS). When using 
BFS, F is implemented as a queue, whereas in DFS F is a stack. We use the term exploration strategies 
to refer to these search policies. Section[4]gives an experimental comparison on the performance of these 
two strategies in abstract state space exploration. 

Line 7 handles the duplicate state detection mechanism. Procedure isFresh(T, Q,F) is responsible 
for checking if shape T (or an equivalent canonical representative) is already in the set Q of all explored 
shapes. Set Q can be quite large, so this check has to be implemented with care, since it can greatly 
impact performance. This is even more important when working with graph grammars; in particular, in 
GROOVE, states are collapsed under an isomorphic representative. Since graph isomorphism can be a 
rather expensive check, it cannot be performed over all elements of Q. 

Listing [2] gives the algorithm for procedure isFresh, as originally described in ifTol . This algorithm 
is based on graph certificates, which basically correspond to a hashing method tuned for graphs. In 
addition to being relatively inexpensive to compute, certificates are built in such a way that graph iso- 
morphism implies certificate equality. The converse, however, is not true, since the certificate function 
may produce false positives. In the algorithm of Listing [2j certificates are used to filter elements of set 
Q, thus producing a much smaller set Q C Q, composed only of graphs with the same certificate of T. 
We then proceed to check if there exists U G Q such that T and U are isomorphic (denoted T ~ U). If no 
such U is found then we can conclude that T is fresh. 

Listing 2: Algorithm for procedure isFresh(r, Q, F). 

let C := certificate^), Q := {U G Q \ C = certificate^)} 
for U G Q 

do if T ~ U //ifT and U are isomorphic 
then return false 
fi 

od 

return true // we checked all candidates but none are isomorphic to T, so T is fresh 

The method just described works very well in practice for concrete state space exploration and since 
shapes also have a graph structure, we can immediately reuse the same algorithm of Listing[2]for abstract 
exploration. However, shapes carry additional information that is not taken into account when using only 
isomorphism checks. To use this additional information, the notion of shape subsumption was developed. 

3.2 Shape Subsumption 

The key insight behind the shape subsumption relation (denoted by the same symbol C used for multi- 
plicity subsumption) lies in the comparison between the concretisations of shapes. Let S and T be two 
isomorphic shapes, with concr(S) C concr(r). Since T has more concretisations than S, rule applica- 
tions on T capture more behaviour than rule applications on S. In fact, all behaviour of S is subsumed by 
the behaviour of T, and therefore, in an abstract exploration we can discard S and only explore T. More 
formally, if S is subsumed by T then for all S — > U G P there exists T — > U' G P such that U C U', for 
any r G R. 

Subsumption is an asymmetric relation built upon isomorphism. Shape S is subsumed by shape T if: 
(i) S and T are isomorphic, (ii) they have the same node similarity relation, and (iii) all multiplicities in 
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S are subsumed by the multiplicities in T. Formally, we have the following definition, where, for v EN, 
we write [v]~ to denote the equivalence class of v induced by ~, i.e., [v]~ = {w E N | v ~ w}. 

Definition 6 (Shape subsumption) Given two shapes S and T, S is subsumed by T, denoted S CT, if: 

• there exists an isomorphism <p : Gs ~ Gr between the graph structures of the shapes; 

• for any (v,w) G ~s> (<P( V )) <P(w)) G ~r; 

• multS(v) C mu\t T ((p(v)), for all v G Afc; and 

• multg(v,/,[w]^) E mult£(p(v),/,[p(w)]~ r ) and multk(v,/,[w]~ 5 ) C mul^(p(v),/,[p(w)]~ r ), 
/or a// (v, l,w) G £"5. -4 

Note that 5 C T implies concr(5') C concr(r). As a simple example, take S as a shape containing 
only one node of multiplicity 2 + and no edges. Then take T also as a shape with a single node, but 
with multiplicity 1 + . From the definition, we have that S C T. Set concr(5) contains graphs with two or 
more nodes, whereas set concr(T) has one more element, namely the graph with just one node. Hence, 
concr(S) C concr(r). 

Shapes that subsume one another, i.e., shapes that are isomorphic and have the same multiplicities 
for all elements, are called strictly isomorphic. From the above it follows that strictly isomorphic shapes 
have the same concretisations (see the accompanying technical report of |3] for the proof). 

Listing [3] shows the modified algorithm for procedure isFresh, with shape subsumption checks incor- 
porated. Given a new shape T that must be tested for freshness, we begin as before, by constructing set 
Q, consisting of the shapes in Q with the same certificate as T. In addition, we initialise an auxiliary 
set B, to store shapes from Q that were identified as subsumed by T. Since subsumption is an asym- 
metric relation, we must check it in both directions (lines 3 and 5 of the algorithm). Note, however, 
that these two subsumption checks do not require two isomorphism checks, since we can first look for 
an isomorphism between T and U (the potentially most expensive operation) and then proceed to check 
both subsumptions using the same isomorphism. 

Listing 3: Algorithm for procedure is Fresh (T, Q, F) with subsumption check. 

let C := certificate^), Q := {U e Q \ C = certificate^)}, B := 
for U G Q 

do if T n. U //ifT is subsumed by U 
then return false 

else if U C T //ifU is subsumed by T 

then let B := BU {U} // mark U as subsumed 
fi 

fi 

od 

F := F \ B // remove the states marked as subsumed from the set of states to be explored 
return true 

An interesting aspect of the new isFresh procedure is that it can now modify the set F of shapes to 
be explored. If there exists U G Q such that T QU , then T is not fresh and can be discarded as before 
(lines 3 and 4 of Listing [3]). However, if we discover that U QT , then not only we know that T is fresh, 
but also that U should not be explored, since all its behaviour is subsumed by T. We then mark U as 
subsumed by adding it to set B (line 6) and continue looking for other subsumed shapes in Q. At the end 
of the procedure we remove all shapes marked as subsumed from F (line 10), thus trimming the search 
space. 
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4 Experiments and Results 

The theory of neighbourhood abstraction ensures that the number of shapes for any graph grammar is 
finite, and therefore that the abstract state space is also finite Q. However, the theoretical upper bound 
of the abstract state space size is still quite large, meaning that in practice we have to optimise the 
state space traversal in order to implement an efficient tool. In |[T8ll we described the major aspects of 
the implementation of neighbourhood abstraction in GROOVE (though without shape subsumption), and 
reported a few experiments. In this section we present more of such experiments, that illustrate: (i) 
the upper bounds on abstract state space sizes that are reachable in practice, (ii) the performance gains 
that are obtained with the shape subsumption technique presented in Section [3] and (iii) how different 
exploration strategies perform in the abstract setting. 

For our experiments we collected 8 graph grammars, from various problem domains. Some of these 
problems can also be solved with other abstractions, but they are usually tuned with certain characteristics 
from the domain at hancJ^J Our abstraction extension for GROOVE, on the other hand, is generic, in the 
sense that the concept of neighbourhood equivalence is always applicable, but perhaps with varying 
performance. Here is a list with a short description of the grammars used. 

• linked-list: a grammar modelling operations on a single -linked list structure. Elements can always 
be appended at the end of the list, which can thus grow unbounded. 

• circ-buf-0: a grammar modelling a circular buffer structure with an unbounded number of cells. 
Cell usage is marked with special labels, without reference to stored objects. 

• circ-buf-1: a variant of the circular buffer where the stored objects are explicitly represented. 

• euler-0: a grammar that can construct Euler's walks of arbitrary size. Adapted from the classical 
Konigsberg bridges problem from graph theory. 

• euler-1: a variant of the Euler grammar without explicitly representing connecting bridges. 

• firewall- [2-6]: the grammar of our firewall example. Network structure is fixed, while packages 
are collapsed by the abstraction. Instances vary on the number of locations: from 2 to 6. 

• firewall-6-F: variant of the firewall grammar with a network of six fully connected locations. 

• car-platoon: grammar simulating a wireless communication protocol between cars, for estab- 
lishing platoons in highways. Cars can enter and leave a platoon at any time, which leads to an 
exponential growth on the number of possible configurations. 

Table[T]gives all the figures on state space sizes for the grammars listed above. Numbers for the BFS 
and DFS exploration strategies are grouped per grammar, to ease the comparison between the two. State 
space sizes are divided in two groups of values: the number of explored states, i.e., the number of shapes 
produced, and the number of transitions between states, i.e., the count of rule applications. State and 
transition counts in Table [T] are broken down in five and three types, respectively. 

The following five types of state count are given in columns 3-7 of Table [T] 

• Maximum is the upper bound on the number of abstract states of the grammar. This number is 
obtained by exploring the state space without shape subsumption, i.e., by running the exploration 
algorithm of Listing [I] with the original isFresh procedure of Listing [2j Thus, these are the figures 
that would have been reported by the prior implementation of [ 18 ]. Empty entries in this column 



Shape analysis, for example, is designed to work on heap pointer structures, which are deterministic graphs. In our setting, 
this correspond to having shapes with all outgoing edge multiplicities limited to 1. 
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Table 1: State space sizes for the explorations performed with different graph grammars. 
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for the larger cases of the firewall grammar indicate that the upper bound could not be computed: 
these runs timed out after several hours of execution, due to state space explosion. Note that 
we give only one maximum state count for both BFS and DFS strategies. The reason is that all 
states are explored when subsumption is off, and therefore the maximum state count is the same, 
regardless of the strategy used. The upper bound provided by this column gives an interesting 
basis of comparison when analysing the reduction obtained with subsumption. 

• Generated is the number of states produced during exploration using shape subsumption, i.e., the 
exploration algorithm of Listing [T] was run with the new is Fresh procedure of Listing [3] Numbers 
in this column correspond to the size of set Q, i.e., the number of states that were added to the set 
of all explored states (line 8 in Listing [T]). When comparing the number of generated states against 
the maximum upper bound we can see the reduction given by subsumption. Take, for example, 
the firewall-4 line, where the number of generated states using DFS with subsumption is an order 
of magnitude smaller than the maximum upper bound obtained without subsumption. The gain 
provided by subsumption can also be seen for the larger cases of the firewall grammar: runs that 
timed-out without subsumption can now be finished when we turn it on. 

• Subsumed is the number of states generated that were later marked as subsumed by another state. 
They correspond to states that are added to set B at line 6 of Listing [3j 

• Relevant is the number of states that were never marked as subsumed during the exploration. This 
column corresponds to the Generated column minus the Subsumed one. The closer the number 
of generated states gets to the relevant state count the better. A perfect exploration method would 
generate only the relevant abstract states, since they are sufficient to cover all concrete behaviour. 

• Discarded is the number of states that were marked as subsumed and were removed from the set 
of states to be explored. This number corresponds to the sum of all states removed from F at line 
10 of Listing [3] 
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Figure 4: Accumulated percentages of the number of states relative to the maximum state space size, for 
the firewall grammar instances with a known maximum state count. 



The remainder columns of Table [T] give the number of transitions outgoing from the states in the associ- 
ated state count columns. 

When comparing the figures in Table [T] for BFS and DFS exploration, it is clear that DFS gives 
a much better performance. The DFS generated state count is smaller than the BFS count in all but 
one test (circ-buf-0), and as we move to grammars with larger state spaces the advantage increases 
greatly, until reaching two orders of magnitude for the firewall-6-F case. The reason for this performance 
difference between BFS and DFS is simple. Usually, the more a shape is transformed by subsequent rule 
applications the more abstract it becomes, until it reaches a fix-point, i.e., further rule applications yield 
the same shape again. These more abstract shapes capture more concrete behaviour and thus can subsume 
other shapes in the state space. As a rule-of-thumb, more abstract shapes are more likely to be part of 
the set of relevant states, and since they are only discovered after a succession of rule applications, these 
relevant states are deeper in the state space. Therefore, DFS is more likely to reach these states first. This 
fact can be seen from the numbers in the Discarded column: BFS generates a lot of states that are later 
discarded. This is wasted effort: in BFS a state is produced and added to sets Q and F but it is very likely 
that later it is going to be removed from F (while remaining in Q). On the other hand, since DFS already 
found more abstract shapes, it is more probable that the search will immediately throw a new state away, 
without storing it on Q, since the new state will be subsumed by some other state already in Q. 

From the discussion above, one may wonder why shapes that were marked as subsumed are kept 
in set Q. The reason is that removing states from Q could leave "dangling" transitions in the set of 
generated transitions P. A possible solution could be the following. For S G Q and T fresh, if S C T then 
we should take all transitions in P with S as a source or target and replace S by T. This on-the-fly state 
space collapsing under subsumption is not provided by the current implementation, but the tool offers a 
simpler option: reachability mode. In this mode we are only interested in the shapes that are reachable 
in the abstract state space, and thus there is no need to store the transitions, i.e., set P is kept empty. In 
this case there is no danger of having "dangling" transitions and we can remove subsumed shapes from 
Q, thus decreasing memory usage. Reachability mode was a late addition to the implementation and as 
such its experimental analysis is left as future work. 

Figure [4] gives a visual aid for the comparison between BFS and DFS, for the firewall grammar 
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Table 2: Running time and memory consumption, with and without state subsumption checks. 
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instances with a known maximum state count. The figure shows a stacked bar chart with the accumulated 
percentages of the number of states relative to the maximum state space size. From this chart we see that 
the percentage of states generated with DFS decreases as the start graph size increases. On the other hand, 
the percentage of states generated with BFS remains roughly the same, at around 40% the maximum (this 
can be seen from the interval sizes of the generated BFS bars in Figure|4]). 

Other metrics that must be analysed in a performance evaluation are running time and memory con- 
sumption. Results for these measurements are given in Table |2j for runs with and without state sub- 
sumption checks. The experiments were performed in a machine with a Intel Xeon X5365 CPU running 
at 3 GHz and a total 32 GB of RAM. Blank entries indicate timed-out executions. From the numbers 
in Table [2] we see the performance improvement given by subsumption: running times for the firewall 
grammar are two orders of magnitude smaller when subsumption is used, and memory consumption is 
also reduced. When comparing the running times for BFS and DFS, we see that both strategies have a 
similar performance when subsumption is not used but when it is turned on, DFS is far more efficient 
than BFS, both in execution time and memory consumption. This performance figures are directly re- 
lated to the number of states generated by each strategy: DFS produces far fewer states than BFS, which 
translates to a large performance gain. This can be confirmed visually with the chart in Figure [5J where 
the running times for the firewall grammar are plotted against start graph sizes. Clearly, DFS has a much 
more tamed growth (note that the time axis is in a logarithmic scale). 

5 Related Work 

Abstraction is an essential ingredient in nearly all methods for system analysis and verification and as 
such there is a vast body of work describing the use of abstractions in different domains. In this section 
we give a (non-exhaustive) discussion on related work that involves state space traversal and graphs. 

In QT], Holte et al. tackle the area of problem solving in artificial intelligence, which boils down 
to finding the shortest path between a start and a goal state. There is a relation of opposition between 
solution quality (the path length) and search effort (states traversed) and many heuristics can be used 
to guide the search. The authors define a so-called "explicit graph notation", where the state space is 
represented by a labelled transition system (LTS), and they proceed to define abstraction algorithms that 
can be used to speed-up the search. One of such algorithms, called STAR, works by building state 
classes that are connected up to a certain abstraction radius. Despite having many similar concepts with 
our work, the abstractions used by Holte are not state abstractions; they operate on the LTS level and not 
on the state representation. Furthermore, the concrete state spaces considered in [ 1 1] are always finite. 
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Figure 5: Running time (with subsumption on) versus start graph size for the firewall grammar. 



In 0, Edelkamp et al. consider the problem of partial analysis/exploration of the state space of 
graph transformation systems. As in the work of Holte et al., this amounts to a guided search over the 
concrete state space where abstraction can be used as an heuristic. Properties of interest for the analysis 
usually encompass existential checks for graph structures; e.g., is a graph with a certain node and edge 
configuration reachable from the start state? Any abstraction that preserves reachability of the goal 
state in the abstract state space can be used to define an heuristic for the guided search in the concrete 
level. Since our neighbourhood abstraction preserves reachability, it could in principle be used as the 
abstraction mechanism for an heuristic search. However, performance may be an issue, since computing 
the transitions of an abstract state is a rather expensive operation. 

Concerning the verification of infinite-state graph transformation systems, Konig et al. have an ex- 
tensive corpus of work, starting with Given a graph grammar their analysis technique extracts an 
approximated unfolding; a finite structure (called Petri graph) that is composed of a hyper-graph and a 
Petri net. The Petri graph captures all structure that can occur in the reachable graphs of the system, 
and dependencies for rule applications are recorded by the Petti net transitions. The final Petri graph 
obtained is an over-approximation that can be used to check safety properties in the original system. If 
a spurious counter-example is introduced by the over-approximation, the abstraction can be incremen- 
tally refined lfl2l . These techniques are implemented in the tool AUGUR which is now in its second 
version ||T3l . An experimental comparison between this tool and our implementation is considered as 
future work. 



6 Conclusions and Future Work 

In this paper we present an abstraction technique for the exploration of graph transformation systems 
with infinite state spaces. We explain the main points of neighbourhood abstraction as implemented in 
GROOVE and we propose a new method for state collapsing, based on the concept of shape subsumption. 
Experimental results show that subsumption gives a significant reduction on the number of states that 
have to be explored, thus improving both the running time and memory consumption of the tool. Fur- 
thermore, the experiments also show that the choice of the exploration strategy has a heavy influence on 
performance, with DFS giving much better results. 
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We see the results presented in this paper as an important achievement over the original implemen- 
tation of abstraction in GROOVE. As any tool developer would know, performance improvements in 
programs that deal with highly combinatorial problems such as state space exploration usually involve a 
painstaking cycle of refactorings, experimentation and fine-tuning. Our case was no different, where the 
original abstraction code had to be rewritten from scratch in order to accommodate shape subsumption. 
A further improvement over the code from lfl"8ll is that rules with NACs (negative application conditions) 
are now also supported, which increases rule expressivity. 

There are many directions where the current research/tool can be extended. Aside from the usual 
points, such as additional experimentation with more test cases and comparison with other tools, we 
consider the following items as future work. 

• Stronger notion of subsumption. The subsumption relation presented here depends on the ex- 
istence of an isomorphism between two shapes. This dependence can be weakened by requiring 
only the existence of an embedding morphism between the shapes, which is not an isomorphism 
but instead an injective sub-graph morphism, similar to a rule match. This weakening of the sub- 
sumption pre-condition makes the relation stronger, and thus should lead to further reductions of 
the state space. This new relation, however, requires additional refactoring of the code, since we 
can no longer re-use the isomorphism checking package from GROOVE. 

• More expressive notions of abstraction. While neighbourhood abstraction can be used for many 
different classes of problems, it does not fare very well when some structural properties should be 
preserved by the abstraction. It cannot, for example, represent connectivity information between 
nodes. When the abstraction does not limit the possible concrete structures that can be generated, 
all cases have to be considered and this leads to a blow-up in the abstract state space size that can 
cripple performance. We can see this from the results for the car-platoon grammar in Table [T] 
the number of states in the state space is too large, and execution was aborted due to an out-of- 
memory error. To tackle these problems, other notions of abstraction are thus in order. We are 
currently working on the theory for a pattern based abstraction, a method that will allow certain 
graph structures of interest to be preserved in the abstract domain. 

Availability. The current abstraction extension described in this paper is implemented in GROOVE ver- 
sion 4.4.6, available at http : //groove . cs . utwente . nl The grammars for the experiments described 
in Section [4] along with the results obtained can also be downloaded at the same address. 
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